Search Improves Label for Active Learning
نویسندگان
چکیده
We investigate active learning with access to two distinct oracles: LABEL (which is standard) and SEARCH (which is not). The SEARCH oracle models the situation where a human searches a database to seed or counterexample an existing solution. SEARCH is stronger than LABEL while being natural to implement in many situations. We show that an algorithm using both oracles can provide exponentially large problem-dependent improvements over LABEL alone. Introduction Traditional active learning uses selective sampling with a LABEL oracle: the learning algorithm provides an unlabeled example to the oracle, and the oracle responds with a (possibly noisy) label. Using LABEL in an active learning algorithm is known to give (possibly exponentially large) problem-dependent improvements in label complexity, even in agnostic settings when no assumption is made about the labeling mechanism (e.g., Balcan et al., 2006; Hanneke, 2007; 2014). A well-known deficiency of LABEL arises in the presence of rare classes in classification problems, frequently the case in practice (Attenberg and Provost, 2010). Class imbalance may be so extreme that simply finding an example from the rare class can exhaust the labeling budget. A good illustration of this is the problem of learning interval functions in [0, 1]. Any LABEL-only active learner needs at least Ω(1/ ) LABEL queries to learn an arbitrary target interval with error at most (Dasgupta, 2005). As soon as any positive example from the interval is found, the sample complexity of learning intervals collapses to O(log(1/ ))— we can simply do a binary search for each of the end points. How can this observation be generalized and used effectively? Searching for examples of the rare class to seed active learning is the way this hurdle is successfully dealt with in practice (Attenberg and Provost, 2010). Domain experts are often adept at finding examples of a class by various, often clever means. When building a hate speech filter, a simple web search can readily produce several positive examples. Sending a random batch of unlabeled examples to LABEL is unlikely to produce any positive examples at all. In practice, it is also common to have counterexamples to a learned predictor. When monitoring the content stream filtered by the current hate speech filter, a human editor may spot an example of hate speech that seeped through the filter. The editors, using all search tools available to them, can be tasked with finding such counterexamples, interactively correcting the learning process. We define a new oracle, SEARCH, that provides counterexamples to version spaces. Given a set of possible classifiersH mapping unlabeled points to labels, a version space V ⊆ H is the subset of classifiers that are plausibly optimal. A counterexample to a version space is a labeled example which every hypothesis in the version space classifies incorrectly. When there is no counterexample to the version space, SEARCH returns ⊥. Why not counterexample a single classifier? Consider a learned interval classifier on the real line. A valid counterexample to this classifier may be arbitrarily close to an interval endpoint, yielding no useful information. SEARCH formalizes “counterexample away from decision boundary,” avoiding this. Thus the learning algorithm must guide the search effort to parts of the space where it would be most effective. Search Improves Label for Active Learning How can a counterexample to the version space be used? We consider a nested sequence of hypothesis classes of increasing complexity, akin to Structural Risk Minimization (SRM) in passive learning (see, e.g., Vapnik, 1982; Devroye et al., 1996). When SEARCH produces a counterexample to the version space, it gives a proof that the current hypothesis class is too simplistic to solve the problem effectively. We show that this guided increase in hypothesis complexity results in radically lower LABEL complexity than directly learning on the complex space. SEARCH can easily model the practice of seeding, discussed earlier. If the first hypothesis class in the sequence has just the constant−1 function, a seed example with label +1 is a counterexample to the version space. We require that SEARCH always returns the label of the best predictor in the nested sequence. For many natural hypothesis sequences, the Bayes optimal classifier is eventually in the sequence. Unlike with LABEL queries where the labeler has no choice of what to label, here the labeler chooses a counterexample. If a human editor spots a piece of content that seeped through the filter and says that it is unquestionably hate speech, it likely is. These counterexamples should be consistent with the Bayes optimal predictor for any sensible feature representation. Balcan and Hanneke (Balcan and Hanneke, 2012) define the Class Conditional Query (CCQ) oracle. Here, a query specifies a subset of unlabeled examples and a label, with the oracle returning one of the examples in the subset with the specified label, if one exists. While the definition of the CCQ oracle doesn’t require the subset to be explicitly enumerated and finite, the motivation and the algorithms proposed in the paper do. In contrast, SEARCH has an implicit domain of all examples satisfying some filter, so search can more plausibly discover relevant counterexamples. The use of SEARCH in this paper is substantially different from the use of CCQ in (Balcan and Hanneke, 2012). Our motivation is to use SEARCH to assist LABEL, as opposed to using SEARCH alone. This is especially useful in the setting where the cost of SEARCH is significantly higher than the cost of LABEL (and class skew is only moderate)—we hope to avoid using SEARCH queries whenever it is possible to make progress using LABEL queries. The Relative Power of Oracles As given by the intervals example, SEARCH can be exponentially more powerful than LABEL. Does it dominate LABEL? Although SEARCH cannot always implement LABEL, we show that it is at least as effective in reducing the region of disagreement of the current version space. The clearest example is learning threshold classifiers H := {hw : w ∈ [0, 1]} in the realizable case, where hw(x) = +1 if w ≤ x ≤ 1, and −1 if 0 ≤ x < w. A simple binary search with LABEL achieves an exponential improvement in query complexity over passive learning. The agreement region of any set of threshold classifiers with thresholds in [wmin, wmax] is [0, wmin) ∪ [wmax, 1]. Since SEARCH is allowed to return any counterexample in the agreement region, there is no mechanism for forcing SEARCH to return the label of a particular point we want. However, this is not needed to achieve logarithmic query complexity with SEARCH: If binary search starts with querying the label of x ∈ [0, 1], we can query SEARCH(Vx), where Vx := {hw ∈ H : w < x} instead. If SEARCH returns ⊥, we know that the target w∗ ≤ x and can safely reduce the region of disagreement to [0, x). If SEARCH returns a counterexample (x0,−1) with x0 ≥ x, we know that w∗ > x0 and can reduce the region of disagreement to (x0, 1]. This observation holds more generally: For any call to LABEL, we can always construct a call to SEARCH that achieves a no lesser reduction in the region of disagreement. In the realizable setting where a zero-error classifier exists in the nested sequence, any call to SEARCH can be simulated with at most two calls to CCQ. Thus CCQ is at least as powerful and at least as difficult to implement in the realizable setting. Our Results We propose and analyze a general purpose agnostic algorithm, LARCH, that uses SEARCH and LABEL (see (Beygelzimer et al., 2016) for details). As an implication of our general theorem in the case when the target hypothesis is a union of k∗ non-trivial intervals in [0, 1], LARCH makes at most k∗ + log(1/ ) queries to SEARCH and at most Õ((k∗)3 log(1/ ) + (k∗)2 log(1/ )) queries to LABEL, with high probability—an exponential improvement over any LABEL-based active learner. In practical applications, it is critical to consider the relative cost of implementing the two oracles. We show that an amortized approach to explicitly trading off using LABEL and SEARCH yields an algorithm with a good guarantee on the total cost (Beygelzimer et al., 2016). Discussion Our results demonstrate that SEARCH can significantly benefit LABEL-based active learning algorithms. Are there less powerful oracles that are as benefitial and still plausible to implement? Another key question is computational efficiency. Can the benefits of SEARCH be provided in a computationally efficient general purpose manner? Attenberg and Provost showed that simply finding a set of examples of the rare class to seed supervised learning or LABEL-based active learning is already very powerful empirically (Attenberg and Provost, 2010). Can we do better with a truly interactive yet efficient algorithm? Search Improves Label for Active Learning
منابع مشابه
Semi-automatic Labeling with Active Learning for Multi-label Image Classification
For multi-label image classification, we use active learning to select examplelabel pairs to acquire labels from experts. The core of active learning is to select the most informative examples to request their labels. Most previous studies in active learning for multi-label classification have two shortcomings. One is that they didn't pay enough attention on label correlations. The other shortc...
متن کاملOnline Multi-Label Active Learning for Large-Scale Multimedia Annotation
Existing video search engines have not taken the advantages of video content analysis and semantic understanding. Video search in academia uses semantic annotation to approach content-based indexing. We argue this is a promising direction to enable real content-based video search. However, due to the complexity of both video data and semantic concepts, existing techniques on automatic video ann...
متن کاملA Naive Bayesian Multi-label Classification Algorithm With Application to Visualize Text Search Results
Search results visualization has emerged as an important research topic due to its application on search engine amelioration. From the perspective of machine learning, the text search results visualization task fits to the multi-label learning framework that a document is usually related to multiple category labels. In this paper, a Näıve Bayesian (NB) multi-label classification algorithm is pr...
متن کاملBeating the Minimax Rate of Active Learning with Prior Knowledge
Active learning refers to the learning protocol where the learner is allowed to choose a subset of instances for labeling. Previous studies have shown that, compared with passive learning, active learning is able to reduce the label complexity exponentially if the data are linearly separable or satisfy the Tsybakov noise condition with parameter κ = 1. In this paper, we propose a novel active l...
متن کاملSupervision Reduction by Encoding Extra Information about Models, Features and Labels
Learning with limited supervision presents a major challenge to machine learning systems in practice. Fortunately, various types of extra information exist in real-world problems, characterizing the properties of the model space, the feature space and the label space, respectively. With the goal of supervision reduction, this thesis studies the representation, discovery and incorporation of ext...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016